

# Accelerating DICe on FPGA

Keaten Stokke & Atiyeh Panahi

Advisor: Dr. David Andrews

October, 29th 2018



### Last Report

- A basic demo of DICe with two small fames
  - 64  $\times$  48 (135 times smaller than the ideal)
  - Almost 3 times faster than the software version.
  - Resource utilization less than 10 percent
    - LUTs 9%, DSP 1%, BRAM 10%







| Testbench | Frequency | HDL Exe Time | DICe Exe Time |
|-----------|-----------|--------------|---------------|
|           | 100 MHz   | 3.5 ms       | 13 ms         |
|           | 150 MHz   | 2.5 ms       | 13 ms         |



### What has been done

- Hardware
- Verilog Code
- Input/Output modules
- Test size



### Hardware

- # subsets (8~14)
- # frames (several tens of thousands)
- Subset size (648~3450)
  - BRAM for subset coordinates =  $14 \times 3450 \times 19 = 0.1$  MB
- Image size (896 × 464)
  - BRAM for two input frames =  $896 \times 464 \times 32 = 1.6 \text{ MB} \rightarrow 3.2 \text{ MB}$
  - BRAM for gradients = 1.6 MB
  - BRAM for parameters =  $11 \times 32 = 0.002$  MB
- Total needed BRAM = 4.9 MB
- Available Kintex 7 BRAM = 1.95 MB
- Available Virtex7 BRAM = 4.5 MB







## Verilog Code

- How to add more options
  - Optimizing arithmetic functions
    - Even 1 clock cycle makes a lot
  - Arbitrary subset shapes
    - Figure out how they have been handled in C++ codes
  - More frames
    - Check needed changes
  - More subsets
    - IP code changed
  - Computing needed parameters for output file
    - Sigma, Beta, Gamma
  - Obstruction feature
    - Not yet



## Input/Output modules

#### Ethernet

- License problems
- Still debugging!





### Test size

- Larger input frames
  - 320  $\times$  240 pixels (5.4 times smaller than the ideal)
  - 1 subset





### Test size

- Larger input frames
  - Execution time and Resource utilization

| Testbench | Frequency | HDL Exe Time | DICe Exe Time |
|-----------|-----------|--------------|---------------|
|           | 150 MHz   | 62.21 ms     | 173 ms        |

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 20903       | 303600    | 6.89          |
| LUTRAM   | 404         | 130800    | 0.31          |
| FF       | 14303       | 607200    | 2.36          |
| BRAM     | 307         | 1030      | 29.81         |
| DSP      | 12          | 2800      | 0.43          |
| IO       | 3           | 700       | 0.43          |
| BUFG     | 6           | 32        | 18.75         |
| MMCM     | 1           | 14        | 7.14          |





## Next Steps

- Fixing Ethernet problems
- Expecting larger images
- Handling a much larger frame count
- Handle various and more subsets
- Python program to intermediate
  - start, video conversion, FPGA communication



## Conclusion

Thank You!